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Abstract 

Stein's method of exchangeable pairs is examined through five examples in relation to Poisson and 
normal distribution approximation. In particular, in the case where the exchangeable pair is constructed 
from a reversible Markov chain, we analyze how modifying the step size of the chain in a natural way 
affects the error term in the approximation acquired through Stein's method. It has been noted for the 
normal approximation that smaller step sizes may yield better bounds, and we obtain the first rigorous 
results that verify this intuition. For the examples associated to the normal distribution, the bound on 
the error is expressed in terms of the spectrum of the underlying chain, a characteristic of the chain 
related to convergence rates. The Poisson approximation using exchangeable pairs is less studied than 
the normal, but in the examples presented here the same principles hold. 

1 Introduction 

Stein's method has become a powerful tool in approximating probability distributions and proving central 
limit theorems. The various formulations of the method rely on exploiting the characterizing operator or 
"Stein equation" of the distribution. The characterizing operator of a random variable X is an operator S 

such that, for a specified class of functions A, ES'/(F) — for all / in A if and only if Y" = X. Stein's 
method can be used to quantitatively bound the difference between two random variables, one of which 
has a known characterizing operator. In this paper we use theorems that obtain error terms from the 
characterizing operator through exchangeable pairs. There are other variations of Stein's method that 
exploit the characterizing operator in different ways, for example the zero bias transformation [531 HI], the 
size bias coupling 01 [71 [2S] , dependency graphs [1] [5] , and other ad hoc methods [HI El HH [2] . 

For a gentle, intuitive explanation of Stein's method for normal approximation see [31| . A more rigorous 
introduction can be found in [36j and similar ideas with more references in |29j . To find an introduction to 
Stein's method of exchangeable pairs for Poisson approximation see [13] . For a very thorough introduction 
to Stein's method in general see [6] . 

An exchangeable pair is a pair of identically distributed random variables {W, W) with the property that 
the distribution of (VF, VF') is equal to the distribution of {W ^W). The typical method used to generate 
useful exchangeable pairs on a finite space SI is through a Markov chain {Xo, Xi, . . .} on 17 reversible with 
respect to tt. If is a random variable on H, then it is easy to show that setting W := W{Xo) and 
W := W{Xi) where Xq is chosen according to tt is an exchangeable pair. This is not the only way to obtain 
an exchangeable pair: for exchangeable pairs from non-reversible Markov chains see [19] and [30] . 

We will examine how modifying the step size of the underlying Markov chain in a natural way affects 
the error term in the approximation acquired through Stein's method. In the case where the underlying 
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Markov chain is ergodic, the step size does not necessarily affect the rate of convergence to stationarity in a 
monotone way. However, the rate of convergence is related to the eigenvalues of the Markov chain, and in 
the examples associated to the normal distribution we are able to express the bound on the error in terms 
of the eigenvalues. 

It will be obvious in the sequel that modifying the exchangeable pair has a profound effect on the error 
term. Most notably, in the theorems we use, Markov chains with larger steps require more computational 
work, and in the case of Poisson approximation, higher moment information. For the examples presented 
here, the chains that allow for the easiest computation always yield the best bound. For other examples it 
is difficult to compute the error term for any chain other than the most computationally simple in a form 
that yields information about the relative sizes of the bounds. Thus, it is difficult to take these examples 
and make a rigorous statement about the step size of the underlying Markov chain and the bound acquired 
from it in a general setting. 

Section[2]introduces Stein's method of exchangeable pairs for normal approximation and explains why the 
effect of step size on the error term is not obvious. Sections [3l HI and [5] each contain one example of Stein's 
method's approximation of respectively, the binomial (with p = 1/2), Plancherel measure of the Hamming 
scheme, and Plancherel measure of the irreducible representations of a group G by the normal distribution. 
Section [S] is tangent to Sections [31 [H and O in that the bounds on the error from those sections are restated 
in terms of the eigenvalues of the chain. The final two sections examine the approximation of the binomial 
and the negative binomial by the Poisson distribution. 

This is a small step in examining and refining Stein's method to be more widely applicable. This paper 
serves to illustrate the type of computations needed to apply the method and will hopefully yield some 
insight into the underlying theory behind Stein's method. 



2 Normal Approximation 

For normal approximation, we use Stein's original theorem [36] not only because most other exchangeable 
pair formulations stem from it, but because in many situations it still yields the best results. Also, the 
theorem is stated in terms of the Kolmogorov metric, but the relative size of the bound on the error is 
determined by the same terms in an analogous theorem where the Wasserstein metric is used [15j . 

Theorem 2.1. ]36^ Let {W,W') an exchangeable pair of real random variables such that E(T4^'|Vt^) = (1 — 
a)W with < a < 1. Also, let E(VF) = and E{W'^) = 1. Then for all xq m R, 



FiW < xo) 



'271 J- 
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y'VarjEliW' ~W)^\W]) 
a 

y/Var(E[(W' - W)^\W]) 



(27r)- 



\W' - W[ 



E{w' - wy 



-,1/4 



(1) 

(2) 



Remarks. 

1. The first bound ([T]) is from [3^, while ([2]) follows from the Cauchy-Schwarz inequality and the straight- 
forward computation E{W' - = 2aYar(y^) for an exchangeable pair {W,W') with E(Ty|T4^) = 
{X-a)W. 

2. A result holds [30] which contains error terms similar to Theorem 12 . 1 1 and often yields better rates, but 
requires almost sure bounds on | — W \ . 
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3. It has been shown [53] that Theorem 12. ll still holds without exchangeabihty assuming instead that W 
and W are equally distributed. However, it is unclear how useful this observation is in practice as 
the exchangeability plays a critical role in defining the pair (W, W) and in computing the error from 
Theorem mU 

It has been casually noted that the error term from Theorem 12.11 should be small when a is small, or 
equivalently when W and W are "close." One line of thought that supports this idea is that if (W, W) is 
bivariate standard normal with covariance (1 — a), then yar(E[(W' — — a*Var{W^). This equation 

loosely implies that smaller values of a should yield better bounds if (W, W) are nearly bivariate normal. 
Another argument for the idea that the error term should be small when W and W are close is that the bound 
from Theorem 12 . 1 1 follows from a Taylor series approximation of W about W . A simple heuristic to illustrate 
this argument can be found in and However, the Taylor approximation takes place exclusively in 
the numerator of the error term and so this argument does not take into account the denominator which 
also decreases when W and W' are close (the same observations can be made directly from the error term) . 
In accordance with these remarks, we will take the value a generated by the exchangeable pair {W, W') of 
Theorem 12. II to be a rough quantitative measure of the "step size" as referred to in the introduction. 

Finally, we remark here that the families of chains from Sections [J] and [5] have a similar form. Both of 
these families of chains are canonically engineered to satisfy the linearity condition E[iy|iy] = (1 — a)W. 
To elaborate, if as described in the introduction, (M^, VF') = {W{Xo),W{Xi)) (that is, {Xq, Xi, . . .} is a 
stationary Markov chain reversible with respect to some distribution tt on a state space and W is some 
function from Q into M), then we have 

E[W'\Xo = i] = Y.^{X, = j\Xo = i)Wij) = (PW)^ , 
jen 

where P is the transition matrix of the Markov chain and W is the vector Wi — W{i). Roughly, the linearity 
condition implies 

PW (1 - a)W, 

so that W must be an eigenvector of the transition matrix P. 

In both of Sections m and [5] the random variable W under study is a member of an orthogonal (in the 
sense with respect to the measure tt under study) family of functions on the state space. From this point, it 
is not difficult to canonically define a matrix indexed hy x Q, with row sums equal to one, which has the 
orthogonal family as eigenvectors, and satisfies the detailed balance equations for the measure tt (although 
the entries are not guaranteed to be positive). We omit a more detailed abstract formulation, but the main 
ideas can be found in ^35j . 

3 Binomial Distribution 

In this section, we examine the most basic example of Stein's method for normal approximation. It is 
well known that the binomial distribution with parameters n and p is approximately normal for large n. 
Stein's method can be used to obtain an error term in this approximation. The setup is as follows: let 
{Xi , X2, . . . , Xn) a random vector where each Xi is independent and equal to 1 with probability p {0 < p < 1) 
and with probability (1 — p). Then take X = -^i- order to clearly illustrate the typical way we 

will analyze the rest of the examples, we will take p = 1/2 in this section. 

The first thing we need to study the error term from Theorem 12.11 is the family of Markov chains that 
induce the exchangeable pair. Given a configuration of an n dimensional 0—1 vector, the next step in the 
chain follows the rule of changing any fixed i coordinates with probability bi/{^^^, so that X^iLo — ^ i^i 
the probability of moving to a configuration Hamming distance i away). Since the probability of going from 
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a configuration 2: to a configuration y is the same as going from y to the random vector {Xi, . . . , X„) with 
each coordinate an independent Bernoulli random variable with parameter p = 1/2 is clearly reversible with 
respect to these chains. Setting X — J27=i ^^"^ ^' ~ Y^l=i where {X[, . . . , X^) is a step in the chain 
described above induces an exchangeable pair. In order to apply Theorem 12.11 the random variable must 
have mean and variance 1. Set W = ^^'^^"^^ ~ x~np define W as W but with X' in place of 

X. The final hypothesis from Theorem 12. II is the following lemma. 
Lemma 3.1. 

E{W'\W)=(^l-2j2i^^)b}jW. 
Proof. Let X = ^^'^ ^' ~ X]"=i -^'i defined above. 



E{X' ~X\X) =^E(X;-X,|X) 

i=l 

= i-mxi = o\x, = I) + J2 nxi = i\x, = o) 

i:Xi = l i:Xi=0 



n 

Substituting X ^ + n/2 and X' = + n/2 yields E{W' - W\W) = -2^"^^ (^) hW, 

which is the lemma. □ 

Now Theorem 12.11 can be applied with a — 2 in) order to apply the theorem, we still need to 

compute the quantities Var{E[{W' - W)'^\W]) and E{W' - W)^. 

Lemma 3.2. 

Var{E[{W' - Wf\W]) = 16B^Var{W^), 

where B = 
Proof. 



E[ix'-xf\x]^Y.^[ixl~x,f\x)+ J2 nixl~x,)ix'^-x,)\x) 

i=l i:Xi = l 

jyii:Xj=0 

i:Xi = l 
j=^i:Xj=l 

+ n{K-x^){x'^^x,)\x) 

i:Xi=0 
j=^i:X j=0 

= nA+ {X{X ~l) + {n- X){n - X - 1) - 2X[n - X))B, 



where A — X^ILi in) Substituting X — y^n/AW + n/2 and X' = ^/n/AW + n/2 into the equation and 
solving appropriately yields E[(T/F' — — ABW^ + C where C is some constant. Taking variances 

proves the lemma. □ 
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Lemma 3.3. 

16 

E(W' - Wf = —(A + 3B(n - 1) + (2 - 3n + nmw^))D. 
n 

where A = X]r=i («) ^ defined as in Lemma \3.'A and D is a constant. 
Proof. Let Y, ^ X[ - X,. 

n 

E [{X' - xr\x] = [y^y,\x] 

i—1 i 

+ 3^E + 6 ^ E [Y^YjYi\X] 

i i 
i 

3=^i 
l¥^i,3 



Counting the number of each type and computing the expected values in a manner similar to the proof 
of Lemma 13.21 yields 



E[{X' - X)'^\X] = nA+ {4B + 6{n - 2)C){n{n - 1) - 4X{n - X)) 

+ [n{n - l){n - 2){n - 3) - 8((n - X)X{X - 1){X - 2) 
+ X{n - X){n -X - l)(n -X - 2))]D + 3n{n - 1)B. 

Here C and D are constants that vanish in the final expression, but C is the probability of any fixed three 
coordinates changing, and D is the same probability but with four coordinates. Substituting in W and W 
as in the previous lemmas and taking the expected value of E [{W — W)'^\W] imphes the lemma. □ 

Lemma 3.4. E[W^] = 3 - f . 

Proof. The moment generating function for W is 



exp 



/2 



exp 



/2. 



andE[M^4] ^0(4)(o). 



□ 



Now that we have all the formulas needed to apply Theorem 12.11 we can prove the main result of this 
section. 

Theorem 3.5. Using the family of reversible Markov chains described previously, the error term given by 

1 (and bo ^ 1). In this case the bound is [^] ■ 

and — ^ — are minimum tor 

a 

By Lemmas 13. 1[ 13. 2[ and 13.41 and the fact that a = 2A, 



Theorem \2.1\ is minimum for bo + bi 
Proof. Because a is positive, it is sufficient to verify that ^'"'('^[(^ \^]) 



, , , 1 T7- J. ■ Var(E[(W'-Wy\W]) 

Oo + Oi — 1. li'irst we examine ^^^^^ — j — — — — 

this term is equal to 



77 - 1 



B2 

l2 
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which is non- negative and equal to zero when 69 + ^1 = 1 • 

For the second term we use the second formulation in Theorem 12.11 and find the minimum of 



By Lemmas 13.11 13.31 and 13. 4[ ^ is equal to 



l + 3(n-l)- 

which is also minimized when 60 + &i = 1, and is equal to 8/n in this case. □ 
Remarks. 

1. It is important to notice that the error term from Theorem 12.11 depends on the Markov chain only 
through the term B j A, where B — X]"=i (^ n{n-i) ) ^» ^ ~ («) ^i- ^^^^ using this fact 
later in Section [51 

2. The case bo + bi = 1 corresponds to the Markov chain that holds with probability bo and changes 
one coordinate chosen uniformly at random with probability 61. Therefore, this chain has the smallest 
maximum step size over all the chains in the family under study. Also, as mentioned in the previous 
section, a quantitative measure of step size is the associated value of a from Theorem 12.11 which is 
equal to 26i/n in this optimal case. Restricting to the case where bo = 0, the chain yielding the best 
bound (61 = 1) has the smallest value of a. This restriction is not artificial; in lieu of the previous 
remark, the chain generated with F{T — t) — b'^., where 6q = 0, and b'^ — bi/{l — bo) for z ^ has the 
same error term from Theorem 12. li as the chain with P(T = t) = bt- In other words, manipulating the 
holding probability changes the parameter a, but can not yield bounds better than those obtained by 
choosing bo — Q- 

4 Plancherel Distribution of the Hamming Scheme (Or the Bino- 
mial Distribution in General) 

In this section we examine the uniform distribution on the eigenvalues of the adjacency matrix for the 
Hamming graph. The Hamming graph H{n,q) has vertex set X equal to n-tuples of {l,2,...,g} (thus 
\X\ — g") with an edge between two vertices if they differ in exactly one coordinate. 

The following information about the adjacency matrix of the hamming scheme can be found in [51 in 
the more generalized setting of association schemes. Let Vi = {q — 1)*("), the number of vertices that differ 
from a fixed vertex by i coordinates. The eigenvalues of H[n, q) are Ki{i) — n{q — 1) — qi with multiplicity 
for i = 0, 1, 2, . . . , n. Choose i with probability -p^ and designate this the Plancherel distribution of the 

Hamming Scheme. Let W{i) — ^^y=- a random variable with unit variance. In order to define the family of 
Markov chains that induce the exchangeable pairs, we must define the q-Krawtchouk polynomials: 

•5 /A / 



K,{^) = Y.^-l)\q^ly-' 



1=0 

Here and in what follows we freely use the convention (™) = for r > m or r < 0. Following 22], define 



r=0 
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For a given T in {1, . . . , n}, define a Markov chain on {0, . . . , 71} by the transition probability of moving from 
i to j as LT{i,j)- Then Lriijj) is a Markov chain on {0, . . . ,n} reversible with respect to the Plancherel 

distribution above ^P(i) = . Following the usual setup, choose i from the Plancherel distribution and 

then j with probability LT{i,j) and set the exchangeable pair {W, W) — {W{i), W{j)). In [52], the author 
uses Theorem 12.11 with the exchangeable pair induced by the chain Li (defined by ^ with T = 1) to obtain 
a bound on the difference between the normal distribution and the Plancherel measure. We will show that 
over a large family of Markov chains, Li is the most local chain and the error term obtained in Theorem l2.1l 
using Li (as was done in [22j ) is optimal over this family. 

Another (somewhat more motivating) way of viewing the Hamming scheme is given in the following easily 
verified proposition. 

Proposition 4.1. ^22] For W defined as above, W is equal in distribution to a binomial distribution with 
parameters n and P — ^ normalized to have mean and variance 1. 

Thus the Plancherel distribution of the Hamming scheme can be defined as a binomial distribution. We 
will redefine the Markov chain LT{i,j) in terms of this characterization, but first we list some well known 
properties of Krawtchouk polynomials found in [28j . 



Lemma 4.2. For j,l £ {0, . . . , n}, 



Lemma 4.3. For j,l £ {0, . . . , n} 



_ \X\^ 



Y,v,K,{i)Ki{i) = \X\v,5,,i. 

i=0 



Lemma 4.4. For j G {0, . . . , n} , 

{3 + l)K,+i{{) - {{n - j){q - 1) + i - qi)K,{i) - {q - l){n - j + l)i^,-i(i). 



Lemma 4.5. For i, j G {0, . . . ,n}, 



K,{i) = {q-iy-^^K,{j). 



Finally, we need one more tool that equates the product of two Krawtchouk polynomials with a linear 
combination of Krawtchouk polynomials. 



Lemma 4.6. For i, j, r e {0, . . . , n}. 



where 



K,{r)K,{r)^ A,.,+i{i)K,+i{r). 

l = -3 



{:i;)tM-^^^ ^ Aj-m (9-1) 
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Proof. The first thing to note is that the Krawtchouk polynomials Ki{r) for i non-negative integers form a 
basis for all polynomials in r, so that such a decomposition exists. Now, fix i and write Aj_i^i{i) — Aj^i^i. 
For j = 0, Ao^i — 1 and Ao.i+( = for / 7^ 0, which agrees with the lemma. 
Also, since Ki{r) = {q — l){n — r) ~ r = {q — l)n — qr, Lemma 14.41 implies 

K,{r)Kiir) - iiq - 2)K,{r) + {i + l)K,+i{r) + {q- l){n - i + l)i^.-i(r), (5) 

which is also consistent with the lemma (the equality ([5]) was shown in 2'i, 1. 
For J > 1, we use Lemma 14.41 and strong induction to obtain 

{j + l)Kj+,{r)K,{r) = 

{{n - j){q - 1) + j - qr)K,{r)K,{r) 

j 

l=-3 

i-1 
j 

= {{n - Mq - J2 A,.,+iK,+i{r) 

l = -3 

-{q-l){n-j + l) J2 A,^l.^+lK^+lir) (6) 
'=-0-1) 

3 

+ ^ [G + I + l)Ki+,+i{r) + iq- l)(n + l)K,+i^i{r) 

l=-3 

~{{n-i- l){q + l + i)K,+i{r)]Aj^,+u 
where the final equality is by Lemma [4.41 For each I, the coefficient of Kij^i{r) in ([6]) is 



{i + l- ]){q - 2)A,- ,+( + {q-l){n-i- (7) 

+ + - (q - l)(n - j + (8) 

The lemma will follow if the expression above is equal to [j + To see this fact, re-index the sums 

in the definition of Aj^i in ([5]) to begin at one, and equate summands. □ 



The fact from the previous lemma that the coefficients in the linear expansion arc positive for q > 2 
has been shown without explicit computation in [18 and restated in the monograph We use that the 
coefficients are positive in the next theorem which shows they can be used to define a probability distribution. 

Theorem 4.7. For q > 2, the Markov chain Lxii, j) defined on {0, . . . , n} has the same transition probabili- 
ties as the following chain: Given a — 1 n-tuple with i ones, choose T coordinates at random. Replace every 
zero coordinate chosen to a one and for each one coordinate chosen, replace it with a zero with probability 
and let the coordinate remain as a one with probability . The probability of going from i ones to j 
ones is Lxii,.]). 
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Proof. By summing over k equal to the number of zeros in the T coordinates chosen, the probabihty of going 
from i to i + I in the chain described is 



Also, 



E 

k=0 



T-k\ /(g-2r-2^-+'\ /(T)(t1J 



k~lj V {q-lV-'' J \ (?) 



Lt{i.j) - ^ 2. ^5 



r=0 



(r) 



V 



T Tl 

E ^T,j+/ X!'"'■'^»+'('')■^J('')■ 



r=0 



The first equality is by Lemma 14.51 and the second is by Lemma 14.61 
Applying Lemma 14.31 implies 

By Lemma [4.61 this equals the desired quantity. □ 

According to Proposition l4.11 the restriction q>2 corresponds to a binomial distribution with parameters 
n and p with 1/2 < p < 1. However, after normalizing to have mean equal to zero and unit variance, a 
binomial random variable with parameters n and (1 — p) is the negative of a binomial random variable with 
parameters n and p. Therefore, because the normal distribution is symmetric about zero, the following 
analysis can be applied to any binomial random variable. 

The chains defined by (|4]) have now been fully described in terms of the binomial distribution. Now we 
will start to examine the error term from Theorem l2.1l The next lemma shows the quantity a from Theorem 
12. H is equal to ^(q-i) ^^^^ °' increasing as a function of T. Also, Theorem 14.71 implies that the maximum 
step size of Lt is T; smaller values of T make W and W "closer" in both senses described in Section [2l 

Lemma 4.8. 122] ¥\W'\W] = (-^^) W. 

Proof. 



1 " 

. /l)t ^ — ^ 



1 ^ Kr{T)Kr{i) 

-0 

W{i) 



Ki{T) \ 
■"1 ) 
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The first equality is by definition and the second uses Lemma 
depends on i through W , the lemma is proven. 



directly. Since conditioning on i only 

□ 



For this example, the error term (in the more general setting of association schemes) has already been 
computed using Theorem 1 2. II in [22 , so we only state the result we need. First we must define a function p2 
on {0, . . . ,n} using the following description. Start from a fixed Xq in X and choose a coordinate uniformly 
at random. Replace the chosen coordinate by one of the remaining q — I options different from the original 
value uniformly at random to obtain Xi. Perform the same operation on Xi to obtain X2 and define P2(j) 
to be the probability that X2 has j coordinates different then Xq. Thus P2i0) = n(q-i) ' ^2(1) — n{q-i) ' 
P2(2) = S;^, and P20-)=0forj>3. 



Theorem 4.9. 



Let W and a defined as above and let T be fixed in {1, . 



, n}. Then for all real xo, 



¥{W < Xq) 



/2Tr 



'dx 



< 



^P2{jr (K,{T) ^ ^ 2Ki{T) 



Vl 



ri/4 



J=0 



1/4 



Armed with this theorem, we can examine how varying the value of T affects the error term. However, 
rather than examining the error for a fixed value of T, we define T to be a random variable on {0, . . . , n} 
with P(T = t) = 6t (and 60 7^ 1). Note that this modification does not affect the stationary distribution of 
the chain. 

There are two reasons for using a random variable instead of a fixed value of T . The first reason is that 
using a random variable yields a larger family of Markov chains. The second reason is that when q = 2, the 
Plancherel distribution is a binomial distribution with parameters n and p = 1/2. In this case. Theorem 14 .71 
implies the chain Lt(i,j) chooses t coordinates with probability bt and changes them all. That is, Lt{i,j) 
follows the rule of moving a (Hamming) distance t away with probability bt and we recover the chain from 
Section [3l 

We will show that the random variable T that minimizes the error term from Theorem l2.1l is induced by 
the Markov chain used in [22] which has 61 = 1 (or alternatively 60 + &i = 1 and feo 7^ 1) and all other bt = 0. 
The computation of the error term with T as a random variable closely follows the proof of Theorem 14.91 
from [52]. To apply Theorem l2.11 we need to compute the terms in the bound on the error. It will be helpful 
to define (Wj, ]¥[) to be the exchangeabe pair defined in the usual way from the Markov chain Lt{i,j). Note 
that Wt = W since using a different Markov chain in the family does not alter the stationary distribution. 

The next two lemmas will generate all the terms needed to apply Theorem 12. II 

Lemma 4.10. 



E[iW;-Wtn]^vi±^^I^^^P2{r) 



r=0 



1 - 



2Ki{t) 



Vl 



VariE[iWi-Wtm) = vlJ2 



Kr{t) 2Ki{t) 



Vl 



E[(VK; - Wtf\ = vl 



E 



1 - 



Ki{t) 

Vl 



Wt 



-6 1- 



Kr(t)\ \ P2{r) 
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Lemma 4.11. 



r=0 



VarmW -Wf\i]) = vlY,^^\^ + i2^: 



Kr{t) 2Ki{t) 



Vl 



g [^[x - SoMlW j _ 6(^1 - SoMlW^^ ^ 



Proof. We prove only the first equafity; the proofs of the remaining are similar. By summing over t equal to 
the value of T chosen, we have 



^ n n 

The equality now follows from the proof of Lemma 14.81 

Applying Lemma 14.111 to Theorem 12.11 proves the following theorem. 



□ 



Theorem 4.12. IfW and {6t}"^Q are defined as above, a = 1 — ^X)"=o ^f^%r^] ' '^"'^ '^^^ '^^^ 
as in Theorem \4-9\ then for all real Xq, 

¥{W <xo)-- 



ler variables are 



1 

dx 



'2tt 



\ 



ri/4 



1/4 



We can now analyze the error term of Thcorem l4.12l over LT{i,j), the family of Markov chains previously 
defined. 



Theorem 4.13. The Markov chain Lx{i,j) that minimizes the error term from Theorem \4-.12\ is the chain 
with P(T = 1) = 6i = 1. 
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Proof. The right hand side of the inequahty of Theorem 14.121 can be rewritten as 



We will show that 





n 




E 










^ ^1/4 



P2{j) 



n2 



{El 




1/4 



(9) 



(10) 



(11) 



is minimum for each j under some set B = {6t}"^oj which implies (llOp is minimum under B (notice Vj > 
for all j). We will then show that the minimum value of (fTTj) is no less than —2 for each j, so that Q is 
also minimum under B. 

Since P2(i) = for j > 3, we only consider j = 0, 1, 2. For j = 0, -ftro(t) = wq = 1, so that (fTTj) does not 
depend on t. For j — 1, the numerator of ifTTj) is equal to —a so that pT|) is independent of t. For j = 2, a 
straightforward calculation (using X]t=o^* ~ 1) yields 



- 1 



2 = 



En 1 t{t — l) 



(12) 



Since the summand in the numerator is when t = 1 and positive for all other t values (where bt is positive), 
the final term in p2|) is minimum for bi + bo — 1 and 64 = for t > 1. □ 

Remarks. 

1. Similar to Section [31 Theorem 14.71 implies that the case 60 + ^1 = 1 corresponds to the Markov chain 
Lt having step size at most one and with associated value of a from Theorem 12.11 equal to . 
Among chains with bo = the chain yielding the best bound has the smallest maximum step size and 
the smallest value of a. We refer to the remarks following Theorem 13.51 on this restriction. 

2. It is interesting to note that the size of the bound on the error in this section and in Section [3] both 
depend on the underlying chain through the same term {B/A from Section [3]). This is obvious from 
the case g = 2, since this case is the same in both sections, but it is not clear why this should carry 
over to other values of q. 



5 Plancherel Distribution on a Group 

In this section we examine the Plancherel measure of the random walk generated by the conjugacy class of 
transpositions on Sn (the symmetric group on n symbols) . First we describe the setup for any group in order 
to state the theorems we will use in the utmost generality. Let G be a group and C be a nontrivial conjugacy 
class of G such that C = . Define the random walk on G generated by C, a Markov chain with state 
space G, as follows: given g in G, the next step in the chain is gh where h is chosen uniformly at random 
from G. Now, denote the set of irreducible representations of G by Irr{G). From 17J, for each character 

of A in Irr{G), there is an eigenvalue of the random walk on G generated by G given by ^^JifX (termed a 
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character ratio) occurring with multiplicity dim{X)'^. A well known fact from representation theory is (see 
for example [34] ) 

J2 dimiXf^\G\, 

XeIrr{G) 



SO that we define the Plancherel measure of G to choose A in Irr{G) with probability — |^ 



Then the unit 



\c\"'x'-{c) 



dim(X) 



is essentially an eigenvalue of the random walk above chosen 



variance random variable M^(A) 
uniformly at random. 

In [22] the author defines a Markov chain reversible with respect to the Plancherel measure of a group G 
with the probability of transitioning from A to p given by 

dim{p) 



Lr{\p) 



dim{X)dim(T) 



7)T^E^'C9)X^C9)X''(5). 



g&G 



Here r is some representation of G. Following the usual setup, define an exchangeable pair {W, W) by 
W = W{X) where A is chosen from the Plancherel measure of G, and define W' = W{p) where p is given 
by one step in the above chain. For the sake of continuity, we will postpone discussing properties of this 
Markov chain family until later in this section. 

Using facts from representation theory, [22] proves the following lemma which allows for the application 
of Theorem 12.11 



Lemma 5.1. 



w. 



Theorem 5.2. ]2S!^ Let C he a conjugacy class of a finite group G such that G ~ C ^ and fix a nontrivial 
irreducible representation t of G whose character is real valued. Let X be a random irreducible representation 



chosen from the Plancherel measure of G. Let W 
F{W<xo)- 



\c\' 



dim{X) 



Y^-l^. Then for all real Xq, 




Here the sums are over conjugacy classes K of G, a^. = 1 — dirn(T) ' o-'n-d p2{K) is the probability that the 
random walk on G generated by G started at the identity is at the conjugacy class K after two steps. 

After proving this theorem, the author applies it with the choice of the irreducible representation r 
corresponding to the partition (n — 1, 1) (more on this notation later) to obtain a central limit theorem with 
an error term in the case where G = 5„ and G is the conjugacy class of i-cycles. We will show that this 
choice of r minimizes the error term from Theorem l5.2l in the special case where C is the conjugacy class of 
transpositions (2-cycles) . 

Before going further we will explain the notation above and state some facts about the irreducible 
representations of Sn found in [34J. For S'„, the irreducible representations are indexed by partitions of 
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the integer n where the partition (n) corresponds to the trivial representation. Another way to index the 
irreducible representations (which can be more useful for combinatorial reasons) is to associate to each 
partition a "tableau" which is a left justified array of equally sized, aligned boxes (we will typically abuse 
notation and refer to A as both the partition and the diagram). For a partition A = (Ai, A2, . . . , Afc) of n, 
where Ai > . . . > A^ > 0, the associated tableau has Ai boxes in the first row, A2 boxes in the second row, 
and so on. Now, numbering the boxes one through n left to right, top to bottom (with this indexing, this is 
technically the largest Standard Young Tableau), we make the following definition. 

Definition. Let A ~ (Ai, A2, . . . , A^) a partition of n. Define the content of box i (in the labeling above) of 
A to be c\(i) = columnx{i) ~ row\{i). 

In order to clarify these definitions, Table[l]is the labeled tableau for A = (5, 3, 3, 1), a partition of n = 12. 



1 


2 


3 


4 


5 


6 


7 


8 






9 


10 


11 






12 











Table 1: Labeled tableau for A = (5, 3, 3, 1). 
Table [2] below is the same tableau, but the box labeled i above now has the value of the content c\{i). 






1 


2 


3 


4 


-1 





1 






-2 


-1 









-3 











Table 2: A = (5, 3, 3, 1) with box i labeled c\{i). 

Because we are specializing to the conjugacy class C of transpositions and P2{K) is the probability of 
being in the conjugacy class K after two steps in the random walk on 5„ generated by C starting at the 
identity element, P2{K) = for many conjugacy classes. The following lemma formulates ^^^x) terms 
of the contents of A for conjugacy classes K where P2{K) ^ (the conjugacy classes that contribute to the 
error term in Theorcm l5.2p . The lemma is proved in the form shown here using Murphy's elements in [16| . 
but can also be found in terms of the Ai in |26j . 

Lemma 5.3. [16] Let x^ii) 

and X (2,2) be the character of the irreducible representation X at a the con- 
jugacy class of a j-cycle and two 2-cycles, respectively. Then 
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1, 

2 " 

—, — TTyicA(i), 

n[n — 1 ^-^ 

„(n-l)(„-2) (g^W'-Q)- 

For the example from Tables [1] and [2] where A — (5, 3, 3, 1), the sum of the contents is equal to four and 
n is equal to twelve. Thus Lemma [5731 implies, for example, ^^J^^l^ = 2/33. 

We pause here to discuss some relevant properties of the chain L\. First, we reiterate the remarks at 
the end of Section [2l the definition of the family of Markov chains stems from the orthogonality relations 
of irreducible characters (see [H] and the references therein). Second, because it is not known how to 
combinatorially describe L\ in general, it is not obvious how to relate the choice of A to step size. However, 
some representations do have nice descriptions and we can use these to gain some intuition. From [20j, if 
1/ is the defining representation on Sn, then given an irreducible representation /i, L^, follows the rule of 
removing a corner box of fi uniformly at random (so that the resulting diagram of n — 1 boxes remains a 
tableau) and moving it to a uniformly chosen concave open position of the altered diagram (to obtain a new 
tableau of size n). For p the trivial representation, v the defining representation, and t := [n — 1, 1), we 
have [34] x"^ = + x'', dim{p) = 1, dim(y) — n, and dim{T) = n — 1 so that 

_ nLi, — Lp 
Lt — — . 

n — 1 

Now, using the orthogonality relations of characters and the fact that x'' = 1 it is easy to see the chain 
Lp holds with probability one. Thus, the previous remarks imply that from an irreducible representation 
/i, the chain Lr moves at most one box of p. Also, again using orthogonality relations, it follows that for 
any irreducible representations /i and A, we have L\{p,fi) — S\p. That is, from the trivial representation 
the chain Lx moves to A with probability one, which will move more than one box if A 7^ r. Lemma 15.31 

— 1/2 

implies that W{^) = (2) Sr=i'^M(*)j so that moving more boxes of a partition to obtain W' from W 
corresponds to a larger step size. This is one sense in which r has the smallest step size among nontrivial 
irreducible representations. 

As discussed previously, a more quantitative measure of step size of the chain Lx is the value of ax from 
Theorem 15.21 From Lemma 15. H we have 

By the definition of cx{i), it is easy to see that ax is minimum over nontrivial irreducible representations for 
A = T, and strictly increases as boxes of a partition are moved from higher to lower rows (this observation 
motivates Lemma |5 .41 below) . This is another sense in which r has the smallest step size among nontrivial 
irreducible representations. 



dim{X) 

X"(2) 
dim{X) 

dim{X) 

X^(2,2) 
dim{X) 



15 



Now, analogous to Section [H in order to show that the error term from Theorem 15.21 is minimum over 
nontrivial irreducible representations at t = (n — 1, 1), we will show that for each conjugacy class K from 
Lemma |5.3[ the term Tk{X) = ^ ^^^(A) ~ ^ j '^^ minimum and at least negative two for \~t. Moreover, 
we will define an ordering on all irreducible representations such that the largest nontrivial irreducible 
representation in the ordering is (n — 1, 1) and then show that Tk is a decreasing function with respect to 
the ordering. This non-standard ordering is a coarsening of the usual dominance ordering found in [34j . 

Definition. Let A — (Ai, . . . , \k) and /i = (/ii, . . . , be two irreducible representations of Sn- We say A 
succeeds /i, denoted A ;^ /x, if Ai > /zi, Ai = /Zi for i = 2, . . . , fc — 1, and Aa; < /Xfc. For i > m {i > k), take 
= (A, = 0). 

It is obvious that (ti — 1, 1) = t ;^ A for all nontrivial irreducible representations A, so that to prove the 
claims above it is enough to show that X >~ ii implies Tk{X) < TK{fJ.) for each K from Lemma 15.31 The 
following lemma is a simpler characterization of the succession relation. 

Lemma 5.4. For two irreducible representations A and fi of Sn, A /i if and only if /.i can be transformed 
from A only by moving blocks from the associated tableau of A from the top row to either the bottom row of 
A or to start a new row. 

Proof. If the defining relations of succession hold, and the two representations are not equal, then either 
Afe < fik or Hk+i 7^ 0. In the former case move a block from Ai to fik, in the latter start a new row. 
Inductively, continuing in this way will eventually lead to fi since the new tableau still succeeds fi. 

Conversely, if fi can be made from A with the described method, then clearly the defining relations of 
succession must hold. □ 

Lemma |5 .41 states that in order to prove X >- fi implies Tk{X) < Tk{^j) (thus that t — (n — 1, 1) minimizes 
the error term), it is enough to prove the statement in the case = (Ai — 1, A2, . . . , Afc + 1); where in order 
to cover both actions described in Lemma [5.41 we break convention and permit Afe = (only if Afe_i ^ 0). 
We make one final (non-standard) definition before we begin the proof of the claims above. 

Definition. Let A = (Ai, . . . , Afe) and /i = (Ai — 1, A2, . . . , Afe + 1). Define a new tableau, the joint of A and 
M by 

A|^ = (Ai - 1, A2, . . . , Afe). 

Now that we have suitable definitions, we can begin to prove the lemmas that will be used to obtain the 
main result. 

Lemma 5.5. For any tableau X, Tid{X) = and T(2){X) = —1. 

Proof. The first assertion follows from the fact that for any representation, the identity element is mapped to 
the identity matrix so that x^i^d) = dim{X). The second is obtained directly from the definition of a^. □ 

The next lemma states a simpler criterion for determining which of r(3)(A) and T(3)(/i) is larger. 

Lemma 5.6. With X and 11 as above, r(3)(/i) — T(3)(A) is non-negative if and only if f{X) (defined below) is 
also non-negative. 

/(A) - (Ai + Afe - fc) (^2^A|,,(*) - Q) + 
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Proof. 

T'(3)(m) -r(3)(A) 

(l - ^ (E. ci(^) - Q))) - (l - ^ (E. - Q))) «A 



(13) 



Notice that '^iC\{i) is maximized for A = (n) in which case it is (2) and strictly less for all other tableau. 
Thus, for all tableau A under consideration, > so that the non-negativity of (|13p is determined by the 
numerator. The numerator of (|13p is equal to 

E.(^^W-^^W) , E.(^A(»)-c^(»)) 
2(3) Q) 

, E. clii) E, c,{j) - E. ca(») E, clij) + (^) E.(ca(z) - c^(^)) 

Rewriting the sums 4(0 = E. + (^1 - 1)* and = E. + (Afe + 1 - fc)* for t = 1,2 

and then simplifying implies (|14p is equal to 

(Afc + 1 - fc)^ - (Ai - 1)^ (Ai-l)-(Afc + l-fc) 

2(3) (2) 
((Ai-l)^-(Afc + l-fc)^)EiCA|^(^) 

2(:30G) 

, (Ai - l)(Afc + l-k) ((Ai - 1) - (Afc + 1 - fc)) 



2(3) (2) 

((Ai-l)-(A, + l-fc))E.4|^(») 

2(3) (2) 
Q)((Ai-l)-(Afc + l-fc)) 

2(3) (2) 



Multiplying by (2(3) (2)) /((Ai - 1) - (Afc + 1 - k j) proves the lemma (since Ai > Afc and fc > 2, this 
multiplication does not affect the non-negativity of the term). □ 

Now we can prove the following lemma. 

Lemma 5.7. Let A and fi be two irreducible representations on the symmetric group. If X >- fi, then 

r(3)(A) <T(3)(m). 

Proof. Lemma 15.61 implies that in order to show the monotonicity of r(3) with respect to the succession 
relation, we only need to show that /(A) > for all tableau A — (Ai, . . . , Afc) with Ai > A2 and A^ < A^-i 
(where A^, may possibly be zero). In order to prove the lemma, we use induction on the n, the number being 
partitioned. In each of the three cases below, we associate to each tableau A of size n + 1 a tableau (f> of size 
n where /(0) > by the induction hypothesis. It can be easily verified that in the case of n — 3, the only 
nontrivial tableau satisfying Ai > A2 and Afc < Afe_i is A = (2, 1, 0), and /(A) > 0. 

Case 1: A^ ^ 0. 
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For this case, let (f) = (Ai, . . . , Afc_i, Afe — 1) be a partition of n and let V be equal to the partition 
(Ai — 1, ... , Afc_i, A/c). Then we have 

/(A) = (Ai + Afe-fc) [Y.c^\^{i)- 
\i=i 

+ (Ai -l){X, + l-k)-± cl,^(i) + 2 + ' J ') ' 

(n-l 

+ (Ai - 1)((A, - 1) + 1 - fc) - + ( 2 ) + 2 

Since /(<?!)) is non-negative by the induction hypothesis, we will show that /(</>) — /(A) < which will 
prove the lemma for this case. Simplifying this expression using the identity ("^^) = (^) + and the 

fact that 

n—\ n 
i=l i=l 

we obtain 

n 

/(</>) - /(A) = (n - l)(Ai - 1) + (n - Ai + l)(Afe - fc) - ^ Cai^W - 
Now we bound the sum of the contents; 

n Ai — 2 \k 

"mA') > J; < + S(> - fc) + {" - (A, - 1) - At)(i - (t - 1)) 

^ (A.' - - 2) ^ p _ ^^^^ _ ^ 

The inequality follows because the first term on the right hand side is the s\im of the contents of row one, 
the second term is the sum of the contents of row k, and the third term is the number of boxes not in rows 
one or k times the minimum content of those boxes. 

The inequality above shows that f{^) — /(A) < ^(Ai, Afe) where the function g is defined by: 

5(Ai, Afe) = (n - l)(Ai - 1) + (n - Ai + l)(Afe - 2) 
_ (Ai-l)(Ai-2) _ Afe(Afc-3) _ /n\ 
2 2 V2/' 

Let V — {(Ai, Afc) : Afc < n + 1 — Ai, 1 < A/c < Ai — 2} and notice that the allowable values of (Ai, Xk) 
are a subset of f. A straightforward (but tedious) analysis shows the maximum of g over the domain V is 
non-positive, which proves the lemma for this case. 

Case 2: Afe = 0, Afe_i ^ 1. 
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In this case, let 4> — (Ai, . . . , Afe_i — 1, 0) and then let V' be equal to the partition (Ai — 1, ... , Afe_i — 1, 1) 
Then let 



' n 



2 +^3 • 



/M = (A, - k) [Y,^ C4|<,(i) - (2) ) + (^1 - - k) 

n-1 

1=1 

and notice f{4>) > by the induction hypothesis. After simplifying using the fact that 

n— 1 n 

E(c^lV'(0)* = E(^A|4*))* - (Afe-i - fc + 1)* t=l,2, 



i=l i=l 

we obtain 



/(</>) - /(A) = n(Ai ~k) + (Xk-i - Ai + l)(Afe_i - + 1) - n2. (15) 

The right hand side of equation (jlSp is increasing in Ai, so that the maximum is attained for Ai = n. In 
this case it is easy to see (llSp is non-positive. 



Case 3: A = (Ai, . . . ,Xk-2, 1,0). 

In this case, let (p — (Ai, . . . , 0) and let be equal to the partition (Ai — 1, ... , Xk-2, 1)- Analogously 
to the previous two cases, define 

/(0) = (Ai - (fc - 1)) " (2)) ^ ~ ^^^^ ~ ~ 

n-l 

i=l 

which implies 



n-l 



2/ V3 



- /(A) = (n + fc)(Ai - n) + ^ c^|^(7) ^ " + 

i=i ^ ^ 



3-2fc- Ai. 



Using that Xi < n and thus J27=i ^<t>\ipi''') — ("2 easy to see the term is negative. 



□ 



Now we move on to the final conjugacy class needed to prove the result. As before we have the following 
lemma which states a simpler criterion for determining which of T'(2.2)(A) and 2^(2.2) (m) is larger. 

Lemma 5.8. With X and ji as above, T'(2,2)(a*) "2^(2,2) (A) is non-negative if and only if h{X) (defined below) 
is also non-negative. 

/i(A) = 2 (Ai + Afe - fc) ( f") -Ec;,|^(i)) - 2(Ai-l)(Afe + l-fc) 



■,n— 1 /n—1 \ n~l / 

2 ( : ) E w + E w + 3 E cil, w + 6 r ) - 2 r 
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Proof. 



(l ^ x'-(2.2) \ 



(16) 



As in Lemma I5.6i the non-negativity of (|16p is determined by the numerator. Using the formulas from 
Lemma 15.31 and rearranging, the numerator of (|16p is equal to 



60 



600 

Analogous to Lemma \5M for 3-cycles, simplifying the term by rewriting c^(*) = X^i + (^i ^ 1)* 

and c^(*) = '^A|/j(*) ^" (Afc + 1 — fc)* for t = 1, 2 yields a term proportional to h{\). In this case, the 
constant of proportionality is (6(4) (2)) /((Ai — 1) — (A^ + 1 — fc)), which is positive and so does not affect 
the non-negativity. □ 

Now we can prove the following lemma. 

Lemma 5.9. Let A and fi be two irreducible representations on the symmetric group. If \ >~ fi, then 

T(2,2)(A) < 7(2,2) (m)- 

Proof. We will exactly follow the strategy of the proof of Lemma [5771 with the function h replacing /. 
Case 1: A^ ^ 0. 

For this case, let (j> = (Ai, . . . , Afc_i, Xk — 1), a partition of n, and ip be equal to the partition (Ai — 
1, . . . , Xk-i, Afe). Then following the proof of Lemma [5?71 



(/i(0) - h{X))/2 = (Ai - l)(Afe - fc + 1) + Afe - (fc + 1) 

n 

H- (n - Afe fc -f 1) ^ cx\f,{i) - n(Ai + A^ - fc - 1) - 3 



< (Ai - l)(Afe -k+l)+i^^j{Xk-k- 1) 

-H(n-Afe + fc + l)f 2 j -n(Ai+Afe-fc-l)-3f 
= (Ai-2)(Afe-fc)-Ai(n-l) <0. 

Here the first inequality follows because n — Afe + fc + l>0 and cx\^{i) < ("2 ^), and the final inequality 
from Afe — fc < n — 1 and Ai > 2. 



Case 2: Afe = 0, Afc_i 7^ 1. 
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In this case, let 4> — (Ai, . . . , Afe_i — 1, 0) and then let V' be equal to the partition (Ai — 1, ... , Afe_i —1,1). 
Then we have 



- h{X))/2 = (Ai - fc)(Afc_i - fc + 1 - n) + I " ) (Afc_i - fc + 1) 



,2 

+ {n- Xk-i + fc - 1) ^ cxif,{i) - {Xk-i -k + lf-3 



< (Ai - fc)(Afe_i -k + l-n)+i^j {Xk-i -k + 1) 

+ (n-Afc_i+fc-l)f 2 ) - (Afc-i -fc + l)'-3f 
= (Ai - Afe_i - 2)(Afe_i - fc + 1 - n). 



The inequality follows because n — Xk-i + fc — 1 > and J2i < ("9 If ^1 ^ -^fe-i ^ 2 > 0, then this 

case is shown as Xk-i — fc + 1 — n<0. If not, then Ai = Afc_i + 1 which implies Xj = Xk-i for j = 2 . . . fc — 1 
(since Ai > A2 + 1). In this case, (fc — l)Afc_i = n and 



= J2J2^~^^ 2 ^■^''"^ + 1 - fc). 

i 1=1 j=i 



Then we have 



ih{cb) - h{X))/2 = (Afc^i + 1 - fc)(Afe_i -k + l-n)+ Q {Xk-i -k + 1) 
+ ^(Afe_i + 1 - k){n - Xk-i + fc - 1) - {Xk-i - fc + 1)' - 

This is a downward facing parabola in Afc_i with roots equal to n + fc — 1 and n + fc — 4, neither of which is 
a possible value of Xk-i since fc must be at least three. 

Case 3: A = (Ai, . . . , Aa_._2, 1, 0). 

In this case, let = (Ai, . . . , Xk-2, 0) and then let ip be equal to the partition (Ai — 1, ... , Xk-2, !)• Then 
we have 



(hicj,) ~ h{X))/2 

^{n + k-3) ( XI ca|m(*) + k-Xi 



1=1 

\2 




+ n(4-fc)-2Ai+fc-(2-fc)^ (17) 
< (n-l)(4-Ai) + fc(2-Ai) (18) 

To see the inequality, notice that n + fc — 3 > and J2i CA|/^(i) < ("2^)- 

The term (|T8l) is clearly negative for Ai > 4, and the smaller cases can be individually handled starting 
from (fTT]) (for example Ai = 2 implies J2i ca|^(«) = ^(2) ^^'^ fc = n + 1). □ 

The main result of this section is stated in the following. 
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Theorem 5.10. If X fi for nontrivial irreducible representations A and /i, then the error term from 
Theorem \5.2\ associated to A is no larger than the error term associated to /i. In particular, the error term 
is minimum for r = (n — 1, 1). 

Proof. By the previous lemmas and remarks, the theorem wiU be proved if T(^^){t) and 2^(2.2) (''") are no less 
than negative two (for K equal to the identity or transposition conjugacy class, Tk is constant). Using the 
formulas from Lemma 15.31 or the fact |34] that for any permutation g, x^(ff) is equal to the number of fixed 
points of g minus one and dim{T) = n — 1, we have 

7^(3)(r)=-3/2, 
T(2,2)(t) = -2. 

□ 



6 Eigenvalue Characterization 

Heuristically, increasing the step size of a Markov chain has the effect of making the chain become "random" 
faster. This suggests that for an ergodic chain, the step size may be related to the rate of convergence 
to stationarity. From basic facts about reversible Markov chains on a finite state space [S], all the chains 
previously defined are ergodic if and only if they are irreducible and aperiodic. In this case, the rate of 
convergence to stationarity is determined by the eigenvalues, where the eigenvalues with the largest moduli 
make the largest asymptotic contributions. Because of this relationship, we will express the error terms from 
the past sections in terms of the eigenvalues of the Markov chains used to induce the exchangeable pairs. 

We first examine the chains from Section |4] (and hence also Section [3] as previously mentioned). For a 
fixed value of i, the eigenvalues for Lt{i,j) have been computed in [22], but we include the proof since it is 
illustrative. Recall the definitions and notation from Section [H 

Lemma 6.1. 122] For fixed t £ {0, 1, ... , n}, the eigenvalues of Lt{i,j) are ^^j^ for s — Q . . .n. 
Proof. By definitions and Lemma l4.3|, 



j=0 ^ r=0 i=0 



Ks{t) 



In other words, if ill is the transition matrix of Lt{i,j) and v is the vector with coordinate j equal to Ks{j), 
then Mv = i^^v. □ 

Vs 

A proof of the next lemma follows along the lines of the proof of Lemma 14.111 
Lemma 6.2. The eigenvalues of Lxii, j) as defined in Section^ are for s — 0, . . . ,n, 

t=Q ^ " 



Recall from the proof of Theorem 14.131 the bound on the error depended only on the following term for 
5 = 0,1,2, 

(19) 



1- ELo^. 



Ki(t) 



22 



By Lemma (IT^ can be rewritten as (A^ — 1)/(1 — Ai) for s = 0, 1, 2. For s = and s = 1, docs not 
depend on values bt, so that we have the following theorem. 

Theorem 6.3. The size of the error term given by Stein's method via the family of chains from Section^ 
is a monotone increasing function of 

Aa - 1 
1-Ai' 

where Ai and X2 are defined as in Lemma \6.2[ 

A natural problem that arises is to ascertain under what setting the eigenvalues that determine the error 
term will be the eigenvalues with largest moduli (not including Aq = 1), as these are the eigenvalues with 
the largest contribution to the rate of convergence. It follows immediately from definitions that in the case 
&o + ^1 = 1 where the error term is minimum, As = 1 — ; so that the ordering of the eigenvalues 

corresponds to the subscript notation. In this case the eigenvalues that affect the error term will be the 
largest in moduli if and only if bi < '^q(l^^2) ■ 

For the chains Lr from Section [5l the eigenvalues have already been computed in [22] using orthogonality 
relations. Recall the definitions and notation from Section [S] 

Lemma 6.4. ^221 The eigenvalues of Lr as defined in section\^ are for conjugacy classes C of the group G, 

dimr 

Also, from Theorem 15.21 and the remarks following it, the bound on the error term depends only on the 
following term for the conjugacy classes K = {id), (2), (3), and (2, 2), 

_dinvT ^20) 



1 _ X"(2) 
dimr 



By Lemma HHdini) can be rewritten as (A^f - 1)/(1- A(2)) for if = (irf), (2), (3), and (2,2). For if = {id) 
and K = (2), (^01) does not depend on the irreducible representation r used to generate Lr, so that we have 
the following theorem. 

Theorem 6.5. The size of the error term given by Stein's method via the family of chains from Section\^ 
is a monotone increasing function of 

1 - A(2) ' 

where \k is defined as in Lemma \6.4\ and K — {i) or K = (2) (2). 

Once again it is natural to ask for a given irreducible representation r, whether the three eigenvalues 
that affect the error term are those with the largest moduli (not including \[id) ~ !)• For r = (n — 1, 1), the 
representation where the error term is minimum, it is well known |_34j that 

X^{K) _ Fk-1 
dimr n — 1 ' 

where Fx is defined to be the number of fixed points of K. In this case, the eigenvalues that affect the error 
term are the eigenvalues with the second, third, and fourth largest moduli. 
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7 Poisson Approximation 



For the final two sections we change focus from approximation by the normal distribution to approximation 
by the Poisson distribution. A metric routinely used for integer supported random variables X and Y is the 
total variation distance defined by 

^ (P(X = fc) - P(y = A:)) . 

We will examine how the step size of the Markov chain that induces an exchangeable pair afTects the error 
term in the following theorem. 

Theorem 7.1. Let W = X^i^i; ^'^'^ random variables, such that E(W^) — A. Let {W,W') an 
exchangeable pair, c any constant, and Poi\ denote the Poisson distribution with mean A. Then for C\ a 
constant only depending on X, 

<Cx mx - cV{W' = W + 1\{X,})\ +E\W ~ cV{W' ^ W - 1\{X,})\] . (21) 

Remarks. 

1. It has been shown [S^ that a result similar to Theorem 17. II but with additional error terms still holds 
assuming only that W and W' are equally distributed. 

2. Ideally, c should be chosen so that we have the approximate equalities 

P{W' + l\{Xi}) K (22) 

c 

W 

F{W' -l\{Xi})K —. (23) 

c 

It is shown in J3J that intuitively the existence of such a constant is likely, a heuristic that is reinforced 
in the examples presented there. In fact, if (W -W) € {-1, 0, 1} and E(W^'-I^|M^) = -a{W-E{W)), 
it is easy to see that for the choice of c = 1/a we have the same error in the approximate equalities 
dm and (|^ . This is in general a useful guide for the choice of the constant c (for more on this line 
of thought see [32]). 

One of the main technical details in analyzing pTjl for different exchangeable pairs, is the choice of the 
constant c. It would be preferable to have a systematic method of choosing this constant based on the 
exchangeable pair so that the results here are not contrived. Ideally, we would choose the constant c to 
minimize the error terms from Theorem 17. 1[ or more feasibly, their Cauchy-Schwarz bound (choose c to 
make the expectation of the terms in the absolute value signs zero). However, in the examples presented 
here, we will choose the constant c to yield the best possible bound under the constraint that the terms in the 
absolute value signs are positive. Admittedly, part of the reason for this restriction is technical convenience, 
but in practice choosing the constant in this way is typical (see the examples of |13)). In the next section we 
will compare the error terms using both of these strategies in a small example. 

Both of the examples presented here are sums of i.i.d. random variables (Bernoulli and geometric). Even 
the simplest introduction of dependence (e.g. the hypergeometric distribution) yield results that make the 
type of analysis in this paper difficult. Because we are in the setting of independence, we can use the same 
family of Markov chains for both examples. Given a vector {Xi, . . . , X„) of non-negative integer valued i.i.d. 
random variables, the next step in the chain follows the rule of choosing k coordinates uniformly at random 
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and replacing them with k new i.i.d. random variables (with the same distribution as the original). It is not 
hard to see that this chain is reversible with respect to vectors of i.i.d. random variables and hence generates 
an exchangeable pair. Extending this exchangeable pair to the sum of the components of the vector allows 
for the application of Theorem 17.11 

Finally, notice that under this chain it is not clear how modifying the number of coordinates chosen to 
be selected (i.e. varying k) will affect the error term. 



8 Binomial Distribution 

It is well known that the binomial distribution with parameters n and p converges to a Poisson distribution 
with mean A as n tends to infinity if np tends to A. For simplicity, in this section we consider the case where 
p = 1/n, so that A = 1. In this case we will show that among the exchangeable pairs associated with the 
family of Markov chains described in Section [7] the term from Theorem 17. II is minimized when k — 1. First 
we will prove some lemmas that will be used to compute the error term from the theorem. 

Lemma 8.1. Let Pfe denote probability under the chain that substitutes k coordinates as described in Section 
Then 

MW = w + i|{x,}) - E (, + 1) ^-^k 



1=1 



Proof. Let the random variable Y be the number of ones in the k coordinates chosen. Then Pfc(W^' — 
W + 1\Y = i) is the probability of i + 1 ones in the binomial distribution with parameters k and p = 
which implies 



,(W' 1\{X,}) = J2 ^k{W' ^W+l\Y = ^)Pfc(y - i) 

i=0 

fc-i 



i=0 

' k \(n-l)--i(T)(TT) 



i=0 



Conditioning and summing over Y also yields the expression for Fk{W = W — l\{Xi\) in the lemma. □ 
For the remainder of the section define 

fc-i 

.k J \n. — 1 ^ 

The next two lemmas prove a useful property of the constant . 
Lemma 8.2. 1 - CkVk{W' = W + 1|{X,}) > 0. 



«-(?Hr^l . (24) 



25 



Proof. Conditioning and summing over the random variable Y equal to the number of ones chosen in the k 
coordinates, 



k-l 

^k{W' ^W + 1\{X,})^Y1 

i=0 

k-l 



k 

i + l 



n - 



< 



E 

i=0 
k-l 



k 



n{i + 1) 



n 



i=Q 



n 



< 



n 

k-l 



n 

k-l 



n 

k-l 



k-i-1 



k-l 

i 



i+l 



n- 1 



Here the first inequality follows from the fact that k < n. 
Lemma 8.3. W - CkPk{W' ^W- ^\{X^}) > 0. 



□ 



Proof. For W = 0, the lemma is trivially true. For W ^ 0, we condition and sum over the random variable 
Y equal to the number of ones chosen in the k coordinates to obtain 



PkiW ^W- 1\{X,}) = J2^k{W' = W-l\Y^i) 



k 

E 

i=l 



k 

i - 1 



=1 

k-i+l 



< 



< 



fWk 
\ n 

fWk 
\ n 



n - 1 



n- 1 



k-l k 



E 



k 

i - 1 



(T)(TT) 

wk\ r (iY)(T-T) " 

r (W-l\({n-l)-{W-l)\ 
1 \ i-l )\ (fe-l)-(i-l) I 



n{n — ly 



k-l 



To see the final inequality, notice that for each summand, the second part of the product is a probability of 
the hypergeometric distribution and the first part of the product is at most one. 

□ 

The previous two lemmas show that for c = Ck defined by ([24| . the terms within the absolute values in 
([21]) are positive. Under this constraint, note that the error terms are decreasing in Ck and that l — Ck^k{W' = 
W + 1\W = 0) = 0. These observations imply that among constants satisfying Lemmas 18 . 21 and l 8 . 31 the error 
from Theorem 17. II is minimized for each k when Ck is defined as ([M)) . As discussed in the previous section, 
this is a natural way to choose the constant in the approximation that allows for the analysis done here. 

We pause here to show in a simple example the difference in the error terms from Theorem 17.11 when 
choosing the constant c according to the two approaches outlined in Section [T] First, we will determine the 
error terms using the strategy we take here in the case fc = 1. For the following computations, recall that 
p — 1/n. We have 

1 - ciFi{W' + l\{Xi}) = l-{n- W)p = pW, 
W -ciVi{W' = W -l\{Xi}) -Wil-p) = pW, 
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which imphes the error from Theorem 17. II is equal to 2p (from [T3], Ca = 1 for A < 1). Choosing instead 
c']^ = n/(l — p), so that c']^Pi(PF' = + 1) = 1 (this was discussed as the alternative system of choosing c), 
we obtain 



1 - c'^¥i{W' + 1\{X,}) = 1 - 
W - c^fi{W' = W - 1\{X^}) = W - 



{n - W)p _ p{W - 1) 
1-p ' 



1 
W 



P 
= 0, 



which implies the error from Theorem 17.11 is equal to 



pE\W-l\ ^ p 



l~p 



In the limit, the two error terms differ in quality only by a constant and ci is asymptotically equal to c'l- 
Although the Cauchy-Schwarz approach using c'^ asymptotically yields a better constant, computing the 
appropriate moment information for general k using this scheme is much more difficult than the strategy we 
choose. Also, this small example suggests that the Cauchy-Schwarz approach will yield superior asymptotic 
rates only in the constant, so that it is not worth the extra effort of computing the more complicated (and 
higher) moment information needed in order to undertake the type of analysis presented in this paper. 
Finally, we note that using the chain here (with fc = 1), it is possible to use intermediate terms in the proof 
of Theorem 17.11 with the constant ci to obtain the superior upper bound of p [13' , however this approach 
does not carry over to the chains with larger step size. 

Moving forward, in order to apply the theorem, we need to take the expected value of the terms in Lemma 
.11 The next lemma has a nice expression for the expectation we need. 



Lemma 8.4. 



Proof. 



E 



n-W 
k — i 



i\{k 



[n-k + l) 



(n-1) 



k—i 



(s'^r""'^) = r"II 



r ) 

n — 1 



r 



n 



Taking i derivatives with respect to s and k — i derivatives with respect to r of 
r = s — \ implies the lemma. 
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and evaluating at 
□ 



The final lemma in this section establishes bounds on the error from Theorem 17. II 
Lemma 8.5. Both o/E[cfcPfe(VF' + 1|{A:,})] and E[cfcPfe(T4^' - 1|{AJ)] are hounded above by 

- r 
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Proof. From Lemmas 18.11 and 18.41 we have 



. w + i|(x.),| = « I ^ ^) (^) 

E 



4=0 



n- l^' /fc\ A- /(n- l)'^-'-! 



=0 



i J \n''-'^{n - ly J \{i + l){k - i) 



71 y -^-^ \ i J \ ^ / V V * / ^ 1) / V (* + 1)(^ ^ 



< 



n — 1 



71 



The inequahty follows from the fact that each summand is the product of two terms no larger than one and 
a probability. 

For the remaining term, exchangeability implies E[cfcPfc(VF' = = E[cfePfc(VK' = VK+1|{X,})], 

which proves the lemma. □ 

Theorem 8.6. For the values of Ck defined previously, the error term from Theorem \7.1\ is minimized for 
k = 1 and is equal to 2p. 

Proof. The final bound was computed previously in this section and the fact that it is minimum follows 
directly from Lemma 18.51 and the easily verified fact 



□ 



n~ I 

E[ciFi{W' 1\{X,})] = E[ci¥i{W' = W - 1|{X,})] . 



9 Negative Binomial Distribution 

The final example presented in this paper is the approximation of the negative binomial distribution by the 
Poisson. A random variable X has the geometric distribution with parameter p if P(X = i) = (1 — pYp for 
all non- negative integers i. Classically, the random variable X is viewed as the number of failures before 
the first success in a sequence of independent Bernoulli trials each with probability of success equal to p. 
The random variable W is negative binomial with parameters r and p \iW = "^jj where the Xi are 

independent geometric random variables with parameter p. By viewing W as the number of failures before 
r successes have occurred in a sequence of Bernoulli trials, it is easy to see that for all non-negative integers 
i, P(W = i) = i^—pYp^- We will use Theorem lT.ll to approximate W by Poi{\) where A is the mean 

of W equal to r(l — p)/p (the mean of a geometric random variable is (1 — p)/p). 
For fixed A, p = r/(A + r) so that 

p(H^=.) = ^ /-+;;^^ ri+^y". (26) 

^ ' i\ {r~ l)!(A + r)' V / 

As r goes to infinity, the distribution converges to a Poisson distribution with mean A. However, for fixed A, 
p approaches one as r goes to infinity, so that when p is small the negative binomial will not be approximately 
Poisson. Because of this fact, in this example we will not obtain a result as straightforward as Theorem 
18.61 For some values of p, the optimal error term does not occur with the smallest step size. We will prove 
all supporting lemmas for general p, but the final theorem will have a natural restriction on the value of p. 
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For this case we will show that among the exchangeable pairs associated with the family of Markov chains 
described in Section[7|the term from Theorem lT.ll is minimized when k — 1. First we will prove some lemmas 
that will be used to compute the error term from the theorem. 

Lemma 9.1. Let P^- denote probability under the chain that substitutes k coordinates as described in Section 
Then 



,{W' = W + 1\{X,}) = 



C{l,...,r} 



,{W' ^ W ~ 1\{X,}) ^ 



C{l,...,r} 



Proof. This follows immediately from conditioning and summing over the subset of (^i, . . . , Xr) chosen. □ 
For the remainder of the section define 

c, = Xa, fkf^ + - 2\ (1 _p)a.^.-iy' (27) 



(fc-2)(l-p) 

P ', , 

ak] the proof can be found in [27] . but it is elementary so we will include it. 



where Ok is the maximum of one and integer part of -^^ — — . The next lemma states a useful property of 



Lemma 9.2. Let r be a positive integer, x any non-negative integer, and a be the integer part of — — ^^p-^. 
V f{y) = ~pYp^ J then f{x) > f{x — 1) for x < a and strictly decreasing otherwise. In particular, 

a is the mode of a negative binomial random variable with parameters r and p. 

Proof. The ratio of / evaluated at consecutive integers is given by 

f{x + 1) / X + r^ 



Comparing this ratio to one implies the lemma. □ 

The next two lemmas prove a useful property of the constant as defined by (|27p . 
Lemma 9.3. A - Ck¥k{W' = W+ 1\{X,}) > 0. 
Proof. 

CkFkiW = W+l\{X,}) = 



-1 



C{l,....r} 
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Define b to be the maximum of one and the integer part of ^ — ^^^^ and note that 6=1 imphes 0^ = 1. 
Lemma 19.21 implies 

CkTPkiW = W+1\{X,}) 



C{l,....r} 



. ik+b-i)p \ n-f){i^p? ^ . 
r.v)(i-p)-- 



The final inequality follows from the definition of b and by applying Lemma 19.21 with r = k — 1. □ 
Lemma 9.4. W - CfcPfc(M^' - l\{Xi\) > 0. 

Proof. The cases = or fc = 1 are simple to verify, so assume otherwise. By Lemma [^?T] and the definition 

of Cfc, 



CkFkiW = W - 1\{X,}) 

k 2 



< 



k-l \ fc~2 



fc- 1 



E E^.)=E^^ = ^- 

{ii,...,ifc} \ j 

C{l,...,r} 



An application of Lemma 19.21 with r = fc — 1 implies the inequality. □ 



The previous two lemmas show that for c — Ck defined by (|27p . the terms within the absolute values in 
(|2T|l are positive. Also note that 

Ck¥k{W' ^W-l\Xi^ak,X,^0-i>l)^ak = W^ 0, 

so that among constants satisfying Lemmas 19.31 and 19.41 the error from Theorem 17.11 is minimized for each 
fc when Cfc is defined as (P7)l . 

To apply the theorem, we need to take the expected value of the term in Lemma |9. II The next lemma 
has a nice expression for the expectation we need. 

Lemma 9.5. If Y is a random variable distributed as negative binomial with parameters p and fc, then 

I 



E 



k + Y\^^ ,yl 1^1=0 I ) \p(2-p) I 

k-l)^'-P^ \= (2^^ 
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Proof. By the definition of expected value, 



E 



k + Y 
fc- 1 



k + i 
fc- 1 



fc + i - 1 
fc- 1 



(28) 



._ EAt-l){'t-,'){il-pmpi2-p))'^ 

If Z is a random variable distributed as negative binomial with parameters and q = p(2 — p) and fc, then (|28[) 

/(2 — p)^ . Using the fact that Z is the sum of independent geometric random 



can be written as 
variables we have 



l-{l-q)s 



1 



Taking fc — 1 derivatives with respect to s and dividing by (fc — 1)! implies 
E 



k + Z 
fc- 1 

fe-i 

- D! ^ 



-.Z+l 



g 

(fc-1) 



/=0 



fc-1 
I 



fc! 



(/ + 1)! 



(1-9) 



;(fc + ;-i)! 



(fc-1)! Vl-(l-9)'5 



k+l 



Finally, substituting s = 1 into implies the lemma. 
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□ 



The final results of this section will be stated in two cases. The first case will pertain to "small" values 
of fc where (fc — 1)(1 — p)/p < 1, and the "large" case to all other values of fc. For fixed fc and A, the small 
case is in some sense the typical case as p should be near one in order for W to be approximately Poisson. 
In this case, there is no need for further restrictions on the value of p in order to prove results analogous 
to the previous section. However, in the large case additional assumptions will be made. We will first state 
and prove results for the small case, then discuss the additional assumptions and prove results for the large 
case. 



Lemma 9.6. For 

hounded above by 



(fc-i)(i-p) 



< 1, both o/E[cfePfe(VF' = W + l\{Xi\)] and E[cfePfe(VF' = W - 1|{X,})] 

r{l~p) 
2-p • 



Proof. For fc = 1,2, the lemma is easy to verify, so assume fc > 3. Let F be a random variable distributed 
as negative binomial with parameters p and fc. Then we have 



cknMw = w +i\{Xi\)] 



Ck{l-p)p''^ 



k + Y 
fc-1 



= Cfc(l-p)/ 

r(l -p) 



Z^/=0 + I ) \p{2-p) ) 



k{2-p) 



{2-p)'' 
1 \ '"^^ 



(30) 



1=0 



fc 

/ + 1 



l-p 



k + l-1 
fc-1 



l-p 



31 



Noting first that {k — 1)(1 — p) < 1, an application of Lemma implies (i^f) most one, 

yielding the following inequality. 

CkE[Fk{W' ^W + 1\{X,})] 



< 



ril-p) ( 1 \ v-V k \fl-p 

„)k-i 2^[i^i 



2-p \k{2 - p)''-^ J ^ \l + I J \ p 



' r{l~p) \ f 

2~p J {kil-p){p{2-p))'^-\ 

From the previous lines, it is enough to show for all 3 < fc < r, the following term is at most one: 

1-p'' 

fc(l-p)(p(2-p))'=-i- ^^^^ 
The difference of (pT|) applied at fc + 1 and k is positively proportional to 

+ (32) 

We will show that this difference is at most zero which implies pip is decreasing in k so that it is enough to 
show the lemma holds in the case where fc = 3. Notice that '-'^^^^^■^^p^ < i implies fc < 1/(1 — p), so that 



k \ /fc-1 



k[Y.p'j-ik + l)pi2-p) l^fj 

= (Ep') {k{l-p)'-p{2-p)) + kp'' 

< (Ep^) + + (33) 

The small k condition for fc > 3 implies in particular that 2/3 < p < 1 so that 1 — 3p + p^ < which, 
starting from ([33)) . yields 

k(j2p'^^{k + l)p{2-p) (^Xi^'j 

< (1 - 3p + p2) + (1 - 3p + p2)(fc - l)/-i + kp'' 
= (1 - 3p + p2) + pk-\k{l - pf - (1 - 3p + p2)) 

< l-3p + p2+/-(2-p) < l-3p + p2 + 2p3-p'*. (34) 

The penultimate inequality follows from the fact noted above that k < 1/(1 —p), and the final inequality 
since fc > 3. From this point it is a straightforward calculus exercise to show the final term in (|34p is negative 
for 2/3 <p < 1. 

For the remaining term, exchangeability implies 

E[cfcPfc(M^' ^W- l\{Xi})] = E[cfcPfe(VK' l\{Xi})l (35) 

which proves the lemma. □ 
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In order to prove a result analogous to Lemma for the case of > 1/(1 — p), the value of p will be 
restricted. Ideally, the values oi p under consideration should coincide with the values of p where the Poisson 
distribution is a good approximation to the negative binomial distribution. First note that Var{W) — 
A + A^/r so that it is not unreasonable to assume that A^ < r. We will instead use a stronger restriction; 
for the remainder of the section take 3X^e^ < r. This may seem like a demanding constraint, but from 
it is clear that in order for the negative binomial distribution to resemble a Poisson distribution, should 
be close to (1 + \/rY . The next lemma shows that the assumption on r is not unreasonable in lieu of the 
previous statement; it can be proved by standard analysis using the Taylor expansion of the appropriate 
functions. 

Lemma 9.7. For r > A^ > 0, 

< - 1 + - < 



7r J \ r ) \ 2r 

Remark. It has been shown [10] that — Poi\\\^y < A/r, which implies that for some values oi p and 
r the restriction 3A^e^ < r is an overly demanding constraint. It is an interesting problem to consider what 
is the minimum constraint that will yield results analogous to Section [8] and how it relates to the proximity 
of the negative binomial to the Poisson distribution. 

Lemma 9.8. For SA^e^ < r and k > 1/(1 - p), both of E[ckFk(W' = W + l\{Xi})] and E[cfePfe(T4^' = 
W — l\{Xi})] are bounded above by 

^(1 -P) 
2-p • 

Proof. Let K be a random variable distributed as negative binomial with parameters p and k. Then contin- 
uing from jSni), for /c > 2, 

Cknvkiw ^w+i\{x,})] 

rii^p) f{k-i){i^p)\'^f k \f 1-p V ctL\')i^~py 



2-p \ k{2~pY-^ J \p{2~p)J (*'-+!_V')(l-p)'^^ 



Using the definition of , Lemma 19. 2[ and an argument similar to the use of the constant b in the proof of 
Lemma [Ol fusing the fact that k > 1/(1 —p)), we obtain an upper bound of 1/p on the appropriate fraction 
in each summand, which yields the following inequality. 

CkElFkiW = W + 1\{X^})] 

(fc- 1) " 




From the previous lines, it is enough to show for all 2 < fc < i 

1) -1 1 <1. (36) 



(fc-1) \(f{l-p) 



fc(2-p)fc-2 J \ \{2-p)p 
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The ratio of successive terms is equal to 

k 



fc2 - 1 



V P(2-P) ) 



k 

- 1 



which is at least one. Thus it is enough to show the inequality ([36|) for k — r. Now, substituting p = r/(A + r) 
into ([55]) yields 

A y ((^ , A(A + r) 



2A + ry VV r{2\ + r) 



/r,\ , \ 2A + 2 



For the inequality we use the fact that (1 + x/n)" < (1 + x/{n + l))""*"^ < if n + a; is positive and n > 1. 
The term ([37]) is clearly decreasing in r; by using the restriction on the value of r and then the inequality 
log(l + x) < X, we have 

y 2X + rJ \y^r{2X + r)J 
<exp{(2A + 2)log(l + ^^)}(l-e-) 



Taking the natural logarithm of ([38| . we have 

2A + 2 , , 2A + 2 y^e-'^ 

i>l 

The final expression is smaller than any partial sum, and it is easy to see by only taking one term in the 
sum (pQ)) is negative for A > 2, and taking three terms yields the proper inequality for A > 1. 

For the remaining term, the equation ([35]) continues to hold in this case, which proves the lemma. □ 



Theorem 9.9. For the values of Ck defined previously, and the set of k where either SA^e^ < r and k > 
1/(1 — p) or fc < 1/(1 — p), the error term from Theorem\7.1\ is minimized for k — 1. 



Proof. This follows directly from Lemmas 19.61 and 19.81 and the easily verified fact 
E[ciPi(W^' + 1\{X,})] = E[ciPi(W^' ^W- 1\{X,})] = 

2 -p 



□ 
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